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Abstract 

Wouter Swierstra's data types a la carte is a technique to modu- 
larise data type definitions in Haskell. We give an alternative im- 
plementation of data types a la carte that offers more flexibility in 
composing and decomposing data types. To achieve this, we re- 
fine the subtyping constraint, which is at the centre of data types 
a la carte. On the one hand this refinement is more general, allow- 
ing subtypings that intuitively should hold but were not derivable 
beforehand. This aspect of our implementation removes previous 
restrictions on how data types can be combined. On the other hand 
our refinement is more restrictive, disallowing subtypings that lead 
to more than one possible injection and should therefore be consid- 
ered programming errors. Furthermore, from this refined subtyping 
constraint we derive a new constraint to express type isomorphism. 
We show how this isomorphism constraint allows us to decompose 
data types and to define extensible functions on data types in an 
ad hoc manner. The implementation makes essential use of closed 
type families in Haskell. The use of closed type families instead 
of type classes comes with a set of trade-offs, which we review in 
detail. Finally, we show that our technique can be used for other 
similar problem domains. 

Categories and Subject Descriptors D.l.l [Programming Tech- 
niques]: Applicative (Functional) Programming 

Keywords expression problem; closed type families; two-level 
types; modularity 

1. Introduction 

Data types a la carte (Swierstra 20081 is a simple, yet powerful 
approach to defining data types and functions on them in a modular 
fashion. It provides a solution to the expression problem, which is 
"to define a datatype by cases, where one can add new cases to the 
datatype and new functions over the datatype, without recompiling 
existing code, and while retaining static type safety" (Wadl er[l998) . 

The elegance of Swierstra's data types a la carte lies in its 
simplicity. It can be implemented and explained in a few lines of 
Haskell code. 

Permission to make digital or hard copies of all or part of this work for personal or 
classroom use is granted without fee provided that copies are not made or distributed 
for profit or commercial advantage and that copies bear this notice and the full citation 
on the first page. Copyrights for components of this work owned by others than the 
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or 
republish, to post on servers or to redistribute to lists, requires prior specific permission 
and/or a fee. Request permissions from permissions@acm.org. 
WGP '14, August 31, 2014, Gothenburg, Sweden. 

Copyright is held by the owner/author(s). Publication rights licensed to ACM. 
ACM 978-1-4503-3042-8/14/08. . .$15.00. 
http://dx.doi.org/10.1145/2633628.2633635 



Central to this techni que is the idea to represen t a recursive data 
type as a two-level type ( Sheard and Pasalic 2004 1 Fix f consisting 
of a signature functor f and a knot-tying fixpoint constructor Fix. 
As a consequence, modularity over the data type Fix f can be ex- 
pressed in terms of the signature functor /. The key components to 
achieve this are (1) the sum operator :+: that allows the program- 
mer to combine two signatures / and g to form their sum / :+: g, 
and (2) the binary constraint f g to express that a signature / is 
subsumed by a signature g. 

In this paper we present an alternative definition of the sub- 
sumption relation :-<;:. In its original form it has been defined as 
a Haskell type class with two parameters. While this provides a 
clean and simple implementation it suffers from a severe restric- 
tion of Haskell's type class resolution: there is no backtracking. 

A consequence of this restriction is that we may not derive, for 
example, the following subsumption, even though every summand 
on the left also occurs on the right: 

/ :+: (g :+: h) K: (/ :+: g) :+: h 

Even the simpler relation g :-<: (/ :+: g) :+: h is out of reach. For 
many small scale uses of data types a la carte, this restriction is 
not an issue or can be worked around. However, in practice this 
restriction creates a number of problems. The most severe of these 
problems occur in the form of leaky abstractions: when refactoring 
a signature functor / by splitting it into two components f , /2 such 
that / ~ /i :+: fi, previous subsumption relations may not hold 
anymore. 

In order to overcome these restrictions and avoid the problems 
that stem from them, we implement the type constraint :-<: using 
the recently introduced closed type families ( Eisenberg et al.|2014) . 
As we shall show, the resulting subsumption constraint :-<: is much 
more flexible and powerful. It permits new use cases such as an 
isomorphism constraint which allows the programmer to de- 
compose and recombine signatures in an ad hoc manner. 

In particular, the contributions of this paper are the following: 

• We define a binary type constraint :-<: that accurately charac- 
terises the intuitive notion of signature subsumption, namely 
such that f g iff each of the summands in / is unique and 
has a unique counterpart in g. 

• We demonstrate that this refinement of Swierstra's original 
definition of K: permits new use cases for data types a la 
carte. In particular, it allows us to compose signature functors - 
and thus data types - more freely without giving up the utility 
provided by the subsumption relation. 

• With the refined version of K:, we are able to conservatively 
characterise isomorphism of signature functors: we define the 
constraint / :~: g as the conjunction off :-<: g and g :-<: /. 



• The isomorphism signature constraint :~: allows the program- 
mer to also decompose signatures more flexibly. The fact that a 
signature / can be decomposed into g and h can be expressed 
as / :~: g :+: h. We demonstrate the utility of this constraint by 
a number of examples. 

• We add restrictions to the subsumption constraint K: in order to 
detect and avoid ambiguities that arise in the injection functions 
that are derived from instances of / :-<: g. Such ambiguities arise 
when a summand occurs multiple times in the right-hand side 
g, e.g. in the case of the instance /:-<:/ :+: /. 

• We give an analysis of the costs and benefits of replacing |Swier-| 
stra's original implementation with our implementation. 

• Our technique is applicable to other similar problem domains. 
We illustrate this observation on extensible product types. 

The remainder of this paper is structured as follows: in section|2] 
we recap data types a la carte and demonstrate the problems that we 
address in this paper. Section[5]is a brief primer on closed type fam- 
ilies and their idiosyncrasies. Our implementation of :-<: is given 
in three steps: in section [4] we give a simple backtracking variant 
of |Swierstra| s definition; section|5]generalises this implementation 
to allow arbitrary compound signatures on the left-hand side; and 
section [6] presents our final implementation, which provides better 
error messages and improves performance. In section^Jwe review 
the limitations of our implementation, discuss related work, and il- 
lustrate other applications of our technique. 

The subsumption constraint :-< : along with the surrounding in- 
frastructure as presented in this paper has been implemented in the 
compdata Haskell library available on Hackage (Bahr and Hvitvedj 
20141. As the implementation relies essentially on closed type fam- 
ilies, it requires the Glasgow Haskell Compiler (GHC) version 7.8. 

2. Data Types a la Carte 

2.1 Defining Types and Functions 

Data types a la carte (Swierstra 20081 is based on the idea of 
splitting a data type definition into a signature functor / and a knot- 
tying type constructor Fix such that Fix f represents the original 
data type: 

data Fix f = In (f {Fix /)) 

The benefit of this representation is that it reduces the problem of 
extending recursive data types to the problem of extending functors. 
The latter is easily achieved by the sum construction: 

data (/ :+: g) a — Inl (f a) \ Inr (g a) 

For example, instead of defining a data type of simple arithmetic 
expressions by a recursive data type 

data Expr — Val Int | Add Expr Expr 
we define the functor 

data Arith a = Val Int | Add a a 
and build the desired data type by taking the fixpoint of Arith: 

type Expr = Fix Arith 

Arith is the signature of the type Expr. 

At a later point we can then extend Expr, e.g. with multiplica- 
tion, using the sum operator: 

data Mult a = Mult a a 

type Expr' = Fix (Arith :+: Mult) 

Functions on data types a la carte follow the same two-level 
approach as the type definitions. Instead of defining functions by 



recursion, they are defined as a fold of an algebra. That is, to define 
a function of type Fix f — > c, we define a function of type 
/ c — s> c, called algebra, and lift it to the desired type by the 
following combinator: 

fold :: Functor f =>• (/ c — > c) — > Fix f -¥ c 
fold f = f (fmap (fold f) x) 

The definitions of algebras then follow the compositional structure 
of signatures. To this end, one defines a type class and instantiates 
it for each signature separately. For instance, assume that we want 
to define an evaluation function for Expr' . We first define a type 
class Eval, which contains an algebra of the appropriate type: 

class Eval f where 
evalAlg :: / Int — > Int 

We then define the algebras for each of the atomic signatures by 
instantiating Eval: 

instance Eval Arith where 
evalAlg ( Val n) = n 
evalAlg (Add x y) = x + y 

instance Eval Mult where 
evalAlg (Mult x y) — x * y 

We then lift the algebras to compound signatures: 

instance (Eval f , Eval g) => Eval (f :+: g) where 
evalAlg (Inl x) — evalAlg x 
evalAlg (Inr x) — evalAlg x 

Eventually, we obtain the following modular definition of an 
evaluation function: 

eval :: (Eval f, Functor f) =>• Fix f — > Int 
eval = fold evalAlg 

Due to its modular definition, eval can be instantiated to work on 
both Expr and Expr': 

evali :: Expr — > Int 
evah = eval 
evah Expr' — > Int 
evali = eval 

This ability to define functions on data types a la carte in a 
modular fashion is complemented with the ability to build and 
deconstruct values of such modular data types, which we shall 
describe in section [2~2| below. The contributions of this paper lie 
in this latter part of the infrastructure. However, as we shall see, 
the added expressiveness in constructing and deconstructing data 
types provides new ways of defining and combining functions on 
data types a la carte. 

2.2 Signature Subsumption 

In order to construct and deconstruct values, data types a la carte 
provides a binary type class K: on signatures that expresses that a 
signature is subsumed by another one, e.g. 

Arith :-<: (Arith :+: Mult) 

The type class :-<: is equipped with methods that can be used to 
define the following two functions that enable the programmer to 
construct and deconstruct values from a compound data type: 

inject :: (f X: g) =^ / (Fix g) -> Fix g 

project :: (/ :-<;: g) =>■ Fix g — > Maybe (f (Fix g)) 

For example, we can use inject, to lift the constructor Val to any 
type that at least contains Arith, which yields the following smart 
constructor: 



val :: (Arith K:/) Int -> Firr / 
val i = inject (Val i) 

Similarly, we can use project to pattern match any value of a type 
that contains Arith against the Val constructor: 

getVal :: (Arith K: /) => Fix f — > Maybe Int 
getVal i — case project i of 

Just ( Val i) — > Just i 

_ — > Nothing 

To understand the behaviour of data types a la carte, we have to 
look at the definition of the type class H : and its instance declara- 
tions. The class declares two methods that form the underpinning 
of the implementation of inject and project: 

class f g where 

mj : : f a — > g a 

prj ::js-> Maybe (f a) 

The functions inject and project are defined in terms of these 
methods as follows: 

inject :: (/ H: g) =>• / (Fix g) -> Fiz g 
inject = In o inj 

project :: (/ :-<;: g) =>■ Fix g — > Maybe (f (_Fia; <?)) 
project (In g) = pr? 5 

What makes the type class :-<: work are the instance declarations: 

instance /:-<:/ where 

iry = irf 

prj = Just 
instance / :-<: (/ :+: g) where 

inj — Inl 

prj (Inl /) = Just f 
prj (Inr (?) = Nothing 

instance (/ :-<:(;)=> / :-<: (h :+: g) where 

inj — Inr o inj 
prj (Inr g) = prj g 
prj (Inl h) = Nothing 

For the moment it is not important how inj and prj are imple- 
mented. More interesting is how we obtain instances of / :-<: g. Of 
particular importance is the apparent asymmetry of the treatment of 
the :+: operator: while :-<: is defined recursively for the right-hand 
side of :+:, we only have a non-recursive instance declaration for 
the left-hand side of :+:. 

As a consequence, :-<;: can be characterised syntactically as 
follows: we have an instance j g iff g is of the form 

gj :+: (. . . :+: (g n - 1 :+: g n ) . . . ) 

and / = gi for some 1 < i < n. To match this behaviour of :-<!:, 
the operator :+: is declared right-associative. Hence, we have that 
/ :-<: g iff g is of the form g\ :+:... :+: g n and / = gt for some 
1 < i < n. However, we have to be careful as we do not have the 
following subsumption for example: 

/i:+:/2K:/i:+:/ 2 :+:/3 

The problem is that the right-hand side is parenthesised as fi :+: 
(h : + : is)- The common workaround for this problem is to split 
constraints of the form /1 :+: / 2 K: g into two constraints: f% :-<: g 
and/ 2 ■■<: g- 

In summary: if we want to use :-<: in order to express signature 
subsumption, we have to make sure that the left-hand side is an 
atomic signature (i.e. not formed by :+:) and that the right-hand 
side is a right-associative sum. 



In many use cases, these limitations of :-<: are unproblematic 
or can be worked around. However, as we shall demonstrate, these 
limitations do cause trouble for many other realistic use cases. 

Let's start with the restriction that signatures on the right-hand 
side of must be right-associative sums. While, this seems in- 
nocuous at first, it clashes with abstraction and compositionality. 
For example, given two concrete signatures Foo and Bar, we may 
want to form a new signature FooBar by summation as follows: 

type FooBar — Foo :+: Bar 

However, if Foo was itself defined as the sum A :+: B, then we 
would not have that A :-<: FooBar. Hence, none of the smart con- 
structors of A can be used to construct a value of type Fix FooBar. 
In order to obtain this subsumption relation, we would have to de- 
fine FooBar in the following way, which breaks abstraction: 

type FooBar = A :+: B :+: Bar 

Furthermore, this restriction to right-associative sums hinders 
refactoring. For instance, we might want to refactor the definition 
of the signature Arith into two parts as follows: 

data Val a — Val Int 
data Add a — Add a a 

type Arith = Val :+: Add 

In practice, such refactoring may become necessary in order to 
avoid duplication. For example, we could now define a type of 
values, e.g. to define an evaluation function: 

type Value = Fix Val 

The types of the smart constructors for Val and Add are refactored 
accordingly, e.g. 

val :: ( Val :-<: /) => Int Fix f 

However, with this refactoring we do not have the anticipated 
subsumption relation Val K: Arith :+: Mult, which renders the 
smart constructor val useless for constructing expressions of type 
Expr' . However, we do have the instance Val :-<: Mult :+: Arith. 
This counterintuitive behaviour is caused by the asymmetry in the 
instance declarations for :-<;:. 

Also the restriction to atomic signatures on the left-hand side 
of :-<: appears harmless at first. For example, smart constructors, 
such as val defined above, always follow this pattern. Similarly, 
the project function is typically used for pattern matching, and thus 
atomic left-hand sides are sufficient. 

However, there are use cases that do require compound signa- 
tures on the left hand side. To illustrate this, we give a recursive 
variant of inject, which can be considered an upcasting operation: 

deeplnject ::(/:-<: g, Functor f) => Fix f — > Fix g 
deeplnject — fold inject 

The function deeplnject uses the injection derived from / :-<: 
g to upcast a complete value from signature / to signature g. 
For example we could imagine having an expression over integer 
literals and multiplication, i.e. of type Fix ( Val :+: Mult) and 
want to turn it into an expression of type Expr' . We could use 
deeplnject to do so, provided that Val :+: Mult :-<: Arith :+: Mult. 
Alas, this is not the case, even though we have that Val :-<: Arith. 

Similarly we can define a function deepProject that performs 
a downcasting operation ( |Bahr a nd Hvitved 2011). Its utility is 
unfortunately equally reduced due to the limitation of H : . 

Another shortcoming of the present implementation of signature 
subsumption can be seen in the type of the method prj: 

prj :: g a — > Maybe (f a) 



This method tries to cast a value of type g a to the "smaller" type 
/ a, returning Nothing if it fails. However, returning Nothing is 
unsatisfying in some settings. If a value of type g a cannot be cast 
to the type / a, we would like a proof of that in the form of a value 
of type h a, where / ~ g:+:h. Given the signature Arith :+: Mult, 
for example, we would like prj to have type 

(Arith :+: Mult) a -» Either (Val a) ((Add :+: Mult) a) 

instead of just 

(Arith :+: Mult) a -> Maybe ( Val a) 

This refined projection method could, for example, be used to 
implement an evaluation function. We first split out the value part of 
the input signature, which is evaluated trivially, and then deal with 
the remainder of the signature - where actual evaluation is neces- 
sary - separately. In general, a more powerful projection function 
as outlined above could be used to define extensible functions in an 
ad hoc manner, without the need to use type classes. We shall see 
an example of such an ad hoc definition in the form of a desugaring 
function in section [5".2.1| 

In the remainder of this paper we present an alternative imple- 
mentation that resolves the issues we have described above. The 
implementation is presented in three steps from section [4] to sec- 
tion [6] Since this implementation uses a fairly recent extension to 
the Haskell language - closed type families - we give a brief in- 
troduction to this new feature in section [3] Readers familiar with 
closed type families in Haskell can safely skip that section. 

3. Using Closed Type Families 

Type families ( Chakravarty et al. 2005 ) extend the type language of 
Haskell to allow the programmer to express limited forms of type- 
level computation: 

type family Element I 

type instance Element [a] = a 
type instance Element Text = Char 

In the code above we first declare the type family Element that 
takes a single type I as an argument and returns a type. Then 
we give two instances of this type family by giving appropriate 
mappings. Any list type [a] is mapped to the type a, and the type 
Text is mapped to the type Char. 

Type families are by nature partial, they do not necessarily 
provide a mapping for each type. For example the type fam- 
ily Element does not provide a mapping for types of the form 
Array a. But type families are also open. That is, we can extend 
the definition - without recompiling the original code - by another 
mapping, e.g. 

type instance Element (Array a) = a 

This openness of type families makes them quite different from 
Haskell functions on the value level. 

Recently, Eisenberg et al. (20141 introduced closed type fam- 
ilies and implemented them in the Glasgow Haskell Compiler 
(GHC). In contrast to their open counterparts, closed type fami- 
lies are defined with a fixed sequence of equations that cannot be 
extended. Moreover, the order of the equations is relevant - simi- 
larly to function definitions in Haskell. For example, the following 
code defines a type family Curry that curries a function type of the 
appropriate form and otherwise does nothing: 

type family Curry t where 

Curry ((a, b) — > c) — a — > b — > c 
Curry a — a 

Note that the two equations are overlapping, e.g. they both ap- 
ply to the type (Int, String) — > Char. But the equations are 



tried in order, and the first applicable equation is chosen. Hence, 
Curry ((Int, String) — > Char) is simplified to (Int, String) — > 
Char. However, the semantics of closed type families is subtle. 
For example, given a type variable t, the type Curry t does not 
simplify to t as one might at first expect. The equations in a type 
family are tried from top to bottom. But it is not sufficient that the 
left-hand side matches in order to make the equation applicable. In 
addition, it is required that none of the equations that appear before 
it can match - for any instantiation of type variables^The example 
type t can be instantiated such that it matches the first equation, 
namely by instantiating t to (a, b) — > c. Therefore, Curry t does 
not simplify to t. By contrast, Curry (s, t) does indeed simplify 
to (s, t). 

But closed type families go even beyond simple pattern match- 
ing by also allowing non-linear patterns, i.e. type variables may 
occur more than once on the left-hand side of equations. For exam- 
ple we may define the following type family that turns any product 
type of the form (a, a) into a function type Bool — > a: 

type family Prod t where 
Prod (a, a) — Bool — > a 
Prod a —a 

Closed type families are particularly handy for dealing with 
types produced by data type promotion ( Yorgey e t al.|2012| >, which 
lifts (a limited class of) data types to the kind level. For example 
the data type definition for Bool 

data Bool = True | False 

yields also two types True and False, each of kind Bool. We can 
then define type families on types of kind Bool as we would define 
functions on type Bool. For example, we can define disjunction: 

type family Or a b where 

Or False False = False 
Or a b = True 

As for open type families, we can provide explicit kind annota- 
tions to closed type family definitions: 



type family Or (a :: Bool) (b : 
Or False False = False 
Or a b = True 



Bool) :: Bool where 



An important fact to keep in mind is that the computations per- 
formed via type families happen all during compile time. Moreover, 
the results of these computations are not available during runtime. 
This complicates writing functions and terms that inhabit the types 
computed by type families. For instance, reconsider the type family 
Prod that we defined above. It maps any type of the form (a, a) to 
the type Bool — > a and any other type to itself. Thus, we have that 
any type t is isomorphic to Prod t. In particular, we should be able 
to write a function of type t — > Prod t that implements one direc- 
tion of this isomorphism. However, the straightforward attempt to 
implement this function fails: 



prod :: t — > Prod t 
prod (x, y) = Xb — i 
prod x = x 



if b then x else y 



GHC will complain that it 

couldn't match expected type 't' with actual 
type ' (tO, tO) ' 

What we really want is to pattern match on the type t to check 
whether it is of the form (a, a) and then return an according 

1 In practice, GHC only approximates this idea conservatively using the 
notion of apartness (cf. Eisenberg et al. 12014}). 



mapping from t to Prod t. There are two methods to achieve this 
in Haskell: use a GADT that reflects the type-level evidence to the 
term level, or use a type class to dispatch on the result of the type 
level computation. 

For the first approach we introduce a GADT that reflects the 
pattern matching we would like to perform on the input type t: 

data Ty t t' where 

IsProd :: Ty (a, a) (Bool — > a) 
NotProd :: Ty t t 

The first argument to Ty is the type we want to pattern match on 
and the second argument is the result of applying Prod to that type. 
In other words, the inhabitants of type Ty t t' are evidence that 
Prod t = t' . We can then write the desired function by pattern 
matching on this evidence: 



Prod t 

> if b then x else y 



prod' :: Ty t (Prod t) -t t - 
prod' IsProd (x, y) = A6 
prod' NotProd x = x 

We can then use a type class to infer the evidence automatically: 

class GetTy t t' where 
getTy :: Ty t t' 

instance GetTy (a, a) (Bool — > a) where 
getTy = IsProd 

instance GetTy a a where 
getTy = NotProd 

Finally, we obtain the definition of the function prod by apply- 
ing prod' to the evidence provided by the function getTy: 

prod :: GetTy t (Prod t) t -> Prod t 
prod = prod' getTy 

This approach is described by Eisenberg et al. (2014) in their 
implementation of a generic zip With function. However, the con- 
struction of explicit term-level evidence is unnecessary as it is im- 
mediately consumed by prod' . Instead, we can use the type class 
GetTy to directly construct the function prod' getTy: 



class GetTy s t where 
prod' :: s — > t 

instance GetTy (a, a) (Bool — > a) where 
prod' (x, y) — Xb — > if b then x else y 

instance GetTy a a where 
prod' x — x 

prod :: GetTy t (Prod t) => t -> Prod t 
prod = prod' 

Apart from being clearer, this approach also avoids the additional 
pattern matching on the type Ty. The overhead due to this pattern 
matching is negligible in this toy example. But as term-level evi- 
dence becomes more complex, the overhead from pattern matching 
may become significant. Therefore, we shall use direct approach 
for the rest of this paper. 

4. Implementing Backtracking Subsumption 

The fundamental problem that we need to solve to improve the 
definition of :-< : is to make it closed under summation from the left 
and right. If we implement K: as a type class, we have to choose 
one over the other, since there is no mechanism to backtrack. That 
is, when checking whether / :-<: g\ :+: g^, we have to commit to 
either checking / :-<: g\ or / :-<: 52 • Haskell's type class system does 
not allow us to try one and upon failure try the other. 

To implement a backtracking variant of :-<: using closed type 
families, we implement a type family that takes two signature 



functors and checks whether the first is a summand of the second 
one. With the help of the type family Or defined in section [3] we 
can implement such a type family quite easily: 

type family Elern (e ::*—>*)(/::*—> *) :: Bool where 
Elem e e = True 

Elem e (I :+: r) = Or (Elem e I) (Elem e r) 
Elem e f = False 

The constraint :-<: can then be implemented by defining the 
following synonym: 

type / :-<: g = Elem f g ~ True 

That is, / is subsumed by g iff Elem f g is equal to True. The 
above definition makes use of the ConstraintKinds extension of 
Haskell to define / :-<: g as a synonym for Elem f g ~ True. 

However, the above definition only covers one aspect of the 
original definition of X:. The original type class :-<: also provided 
two functions inj and pro]. With the above setup alone we do 
not have any concrete type-level evidence to implement these two 
functions. Instead of only producing a Boolean as a result, the type 
family Elem must also provide evidence of the fact that the first 
argument is contained in the second argument. We will represent 
such evidence by the following kind Pos, which intuitively denotes 
the position of an occurrence found by Elem: 

data Pos = Here | Left Pos | Right Pos 

Note that we make use of Haskell's data type promotion facil- 
ity (Yorgey et al.||20i2} to use Pos as a kind. For example, Left 
is used as a type constructor of kind Pos — > Pos. 

Instead of using the kind Bool, we then use the following kind 
Res, which provides the position of the occurrence found in the 
second argument of Elem: 

data Res = Found Pos \ NotFound 

The definition of Elem is easily refactored to produce a type- 
level evidence of kind Res: 

type family Elem (e :: * — > *) (p :: * — > *) :: Res where 

Elem e e — Found Here 

Elem e (I :+: r) = Choose (Elem e I) (Elem e r) 

Elem e p — NotFound 

type family Choose (I :: Res) (r :: Res) :: Res where 
Choose (Found x) y = Found (Left x) 

Choose x (Found y) = Found (Right y) 

Choose x y = NotFound 

We replace the type family Or by the type family Choose, which 
produces an appropriate type-level evidence. 

Using the result produced by Elem, we can derive the inj 
and prj function. Following the approach outlined in section|3]we 
define the following type class: 

class Subsume (res :: Res) f g where 
inj :: / a — > g a 
prj' :: g a — > Maybe (f a) 

Subsume is the same as K: from section [2] except that it has an 
additional type parameter of kind Res. With this setup we can 
define the instance declarations that we want, namely by recursion 
in the left- and the right hand-side of :+:. The additional argument 
of kind Res acts as an oracle that tells Haskell's type instance 
resolution which instance declaration to take. 

Unfortunately, we cannot use the type class Subsume as it is 
defined above since the type res does not occur in the type of either 
class methods inj' and prj' . The solution is simple, though: we add 
a dummy argument that mentions the type: 



data Proxy a — P 

class Subsume (res :: Res) f g where 
inj' :: Proxy res — > / a — s> g a 
prj' :: Proxy res — > g a — > Maybe (fa) 

Providing instance declarations is easy now. The declarations 
follow the same idea as the original definition of :-<: from section[2] 
The only exception is that the case for the left summand is now 
analogous to the case for the right summand: 

instance Subsume (Found Here) f f where 

inj _ = id 

prj' _ = Just 

instance Subsume (Found p) f I 

=> Subsume (Found (Left p)) f (I :+: r) where 

inj' - = Inl o inj' (P :: Proxy (Found p)) 

prj' _ (Inl x) = prj' (P :: Proxy (Found p)) x 
prj' _ (Inr _) = Nothing 

instance Subsume (Found p) f r 

=>■ Subsume (Found (Right p)) f (I :+: r) where 

inj' _ = Inr o inj' (P :: Proxy (Found p)) 

prj' _ (Inr x) — prj' (P :: Proxy (Found p)) x 
prj' _ (Inl _) = Nothing 

The subsumption constraint :-<: is then defined as follows: 

type / H: g = Subsume (Elem f g) f g 

This allows us to define the final injection and projection func- 
tions as follows: 

inj :: V / g a . (/ :-<!: g) => f a ->• g a 
inj = inj' (P :: Proxy (Elem f g)) 

Prj :: V / g a . (f g) g a -> Maybe (f a) 
prj — prj' (P :: Proxy (Elem f g)) 

With this implementation we indeed obtain subsumption rela- 
tions of the forrn_] 

'/:".::/ :+: ff) :+: h 
For instance, in the example from section[2] we have the anticipated 
subsumption Vol K: Arith :+: Mult. Recall that in the type class- 
based implementation, we did not have this subsumption, but we 
did have the subsumption Val :-<: Mult :+: Arith. With the above 
closed type families-based implementation we get both. 

However, this new implementation still suffers from the same 
problem of ambiguity as the original type class-based one: we can 
still derive subsumptions that permit more than one injection func- 
tion, e.g. / f :+: /. Such subsumption relations are typically 
unintended and we should try to avoid them and instead provide 
an error message to the programmer to inform her about the am- 
biguity. For instance, we may forget that the Arith signature al- 
ready contains the Val signature and try to derive the subsumption 
Val K: Arith :+: Val. 

The implementation we have given in this section can be easily 
extended to check for ambiguity. Firstly, we have to extend the kind 
Res by another type to indicate ambiguity: 

data Res = Found Pos \ NotFound \ Ambiguous 

Secondly, we extend the definition of the type family Choose by 
three additional equations: 

type family Choose (I :: Res) (r :: Res) :: Res where 
Choose (Found x) (Found y) — Ambiguous 

2 The signatures on either sides have to be ground, though. This issue is 
discussed in detail in section [TT| 



Choose Ambiguous y = Ambiguous 

Choose x Ambiguous = Ambiguous 

Choose (Found x) y — Found (Left x) 

Choose x (Found y) — Found (Right y) 

Choose x y — NotFound 

The first equation detects ambiguities, while the second and 
third equation propagate any ambiguity that we have found. The 
remaining equations are the same ones we had before. Also the 
other definitions stay the same. 

With the thus amended definition, we indeed avoid ambiguous 
embeddings from multiple occurrences of the same summand. For 
instance, the constraint Val K: Arith :+: Val is no longer satisfied 
and is rejected with the error message 

No instance for 

(Subsume Ambiguous Val (Arith :+: Val)) 

Rejecting ambiguous subsumptions is not necessary. The law 
that we would expect the derived functions inj and prj to sat- 
isfy (Delaware et al. 2013) can be formulated as follows: 

prj x = Just y iff inj y = x (INVERSE) 

|Swierstra[ s original implementation as well as the implementation 
given here (be it with checking for ambiguity or not) satisfy this 
law. Nonetheless we argue that ambiguity is typically undesired 
and should be considered an error. 

The implementation that we presented in this section resolves 
some of the issues that we have identified in section [2] In particu- 
lar, the implementation treats :+: symmetrically, and it avoids am- 
biguous injections. However, it still does not allow arbitrary sums 
on the left hand side. For example, we cannot derive the following 
subsumption: 

Add :+: Mult :-<: Arith :+: Mult 

We should be able to derive the above subsumption since Arith 
subsumes Add. However, our implementation as well as the orig- 
inal implementation of Swierstra can only derive a subsumption if 
the left-hand signature appears as a summand in the right-hand side 
signature. In the next section we further refine our implementation 
to deal with this case. 

5. Subsumption for Compound Signatures 

In this section we generalise the implementation of the subsump- 
tion constraint K: to allow compound signatures on the left-hand 
side. This generalisation proves useful for a number of use cases. 
In particular, it will allow us to define an isomorphism constraint 
:~: on signatures. In this section we will give a straightforward im- 
plementation of :-<: that has these properties. In section|6] we shall 
give a revised implementation that provides better error messages 
and has better performance properties. 

5.1 Decomposing Compound Signatures 

Our first approach to generalise the subsumption constraint imple- 
mented in section [4] to compound signatures on the left-hand side 
follows a simple recipe: (1) decompose the left-hand side signature 
into its atomic summands, and (2) use the subsumption constraint 
from section|4]on these atomic summands. 

The idea is to decompose the left-hand side signature / in a con- 
straint / g and then try to obtain an embedding using Elem f g 
for each component /' of /. To this end we introduce the following 
kind Struc, which describes the structure of a (potentially) com- 
pound signature and provides types of kind Res for each atomic 
component in that structure: 

data Struc — Sum Struc Struc I Atom Res 



The following type family GetStruc performs the decomposi- 
tion on its first argument and refers to Elem once it has found an 
atomic signature: 

type family GetStruc f g :: Struc where 
GetStruc (/i i-K/a) g = Sum (GetStruc f\ g) 

(GetStruc f 2 g) 
GetStruc f g = Atom (Elem f g) 

As before, we use a type class that traverses the evidence pro- 
duced by GetStruc in order to define the desired injection and pro- 
jection functions: 

class Subsume' (s :: Struc) f g where 
inj" :: Proxy s — > / a — ?> g a 
prj" :: Proxy s — > g a — > Maybe (f a) 

instance Subsume res f g 

=> Subsume' (Atom res) f g where 
inj' _ x = inj (P :: Proxy res) x 

prj" _ x = prj' (P :: Proxy res) x 
instance (Subsume' s 1 /i <?, Subsume' S2 f 2 5) 
=>■ Subsume' (Sum si S2) (/1 : + : /2) 9 where 
inj" _ (7ni x) = inj" (P :: Proxy si) x 
inj" _ (Tnr x) = inj'' (P :: Proxy s 2 ) x 

prj" _ x = case prj" (P :: Proxy si) a; of 
Just «/ — > Just (JnZ 

_ — >• case prj" (P :: Proxy S2) x of 
Just ?/ -> Just (Thr y) 
Nothing — > Nothing 

For the case of an atomic signature we use the injection and projec- 
tion from the corresponding instance of Subsume. Whereas in the 
case of a sum we recurse. 

We can then redefine the subsumption constraint :-<: as follows: 

type / X: g = Subsume' ( GetStruc f g) f g 

The injection and projection functions are redefined accordingly 

inj :: V / g a . (/ :-<!: g) =>• f a ->• g a 
inj = inj" (P :: Proxy (GetStruc f g)) 

prj :: V / g a . (f :■<: g) =>■ g a -» Maj/be (/ a) 
prj — prj" (P :: Proxy (GetStruc f g)) 

Now we are finally able to derive non-trivial subsumptions with 
a compound left-hand side, e.g. 

Val :+: Mult K: Arith :+: Mutt 

For example, we can use the deeplnject function from section [2^2] 
to upcast any expression over Val :+: Mult into an expression over 
Arith :+: Mult: 

upcast :: Fix (Val :+: Mult) -> Fix (Arith :+: Mute) 
upcast = deeplnject 

However, this implementation of K: is still not fully satisfac- 
tory. Our implementation avoids ambiguity caused by subsump- 
tions with multiple occurrences of the same signature on the right- 
hand side, e.g. Val K: Val :+: Val. Since we now allow compound 
signatures on the left-hand side, the converse may happen: our im- 
plementation happily derives that Val :+: Val :-<: Val. 

This phenomenon is qualitatively worse than ambiguity, since 
it means that the derived functions inj and prj do not satisfy the 
|lNVERSE| law. In particular, inj is not injective. The solution to this 
problem is simple: we add another constraint to the definition of :-<: 
that checks whether the left-hand side contains duplicates. Figure[T] 
contains the implementation of the type family Dupl, which checks 
for duplicate occurrences of the same atomic signature in a given 



type family Dupl (/::*—> *) (/ :: [* — > *]) :: Bool where 
Dupl (f :+: g) I = Dupl f (g ': I) 
Dupl f 1= Or (Find f I) (Dupl' I) 

type family Dupl' (I :: [* — >■ *]) :: Bool where 

Dupl' (f ': I) = Or (Dupl f I) (Dupl' I) 

Dupl' '[] = False 

type family Find (/::*—» *) (/ :: [* — > *]) :: Bool where 

Find f (g ': I) = Or (Find' f g) (Find f I) 

Findf'[] = False 

type family Find' (/::*—> *) (g :: * — > *) :: Bool where 
Find' f (gi :+: g 2 ) = Or (Find' f g x ) (Find' f g 2 ) 
Find' f f = True 

Find' f g = False 



Figure 1. Checking for duplicate occurrences of signatures. 



signature. To this end, Dupl takes an additional worklist parameter 
of kind [* — > *], i.e. a list of signatures. Dupl proceeds by 
decomposing the argument, recursing on the left summand and 
adding the right summand to the worklist. Once it reaches an atomic 
signature, it checks whether this atomic signature occurs in one 
of the signatures in the work list. Moreover, it repeats the check 
for every signature in the worklist. Note that ': and ' [ ] denote the 
constructors for type-level lists. 

We can thus refine the implementation of as follows: 

type / :-<: g = (Subsume' (GetStruc f g) f g, 
Duplf'l] ~ False) 

With this new definition, subsumptions such as Val :+: Val :-<: Val 
are not derivable anymore. 

5.2 Signature Isomorphisms 

The added generality of H : brings a new set of use cases for data 
types a la carte. As we illustrated in section |2.2| the type of the 
projection function prj is somewhat unsatisfying: given / H: 5, the 
projection prj either returns a value over signature / or it returns 
Nothing. However, if projection into / fails, we have learned that 
the input must be coercible into a signature h with g ~ / :+: h. 

The new implementation of K: allows us to do just that by 
giving us a means to express g ~ / :+: h constructively. In 
particular, we can define the binary constraint :~: on signatures: 

type / :~: g = (/ K: g,g:-<:f) 

That is, we define signature isomorphism as subsumption in both 
directions. 

We can now express that a signature / can be split into two 
disjoint sub-signatures /1 and f 2 as the constraint / :~: /1 :+: f 2 . 
The following function split will allow us to do pattern matching 
according to such a decomposition into two disjoint sub-signatures: 

split :: (f :~:h-+-h)^ 

(fi a — >• b) — > (f 2 a — > b) — > f a — > b 
split fif 2 x — case inj x of 

Inl y-+fiy 

Inr y -¥ f 2 y 

Note that we, in fact, only need one of the two subsumptions 
that make up the isomorphism constraint in order to define split, 
namely / :-< : /1 :+: f 2 . The inj function for this subsumption allows 
us to map / a into (/1 :+: f 2 ) a. The converse subsumption is only 
needed to make sure that /1 and f 2 do not contain any "junk", i.e. 
signatures that are not already present in /. 



class Desug f g where 

desugAlg :: / (Fix g) — > Fix g 

instance (Add :-<: g) =>• Desug Dbl g where 
desugAlg (Double x) = inject (Add x x) 

instance (Desug fi g, Desug f^ g) 

Desug (fi :+: fc) g where 
desugAlg (Inl x) = desugAlg x 
desugAlg (Inr x) = desugAlg x 

instance (/ K: (?) =>■ Desug f g where 
desugAlg = inject 

desugar :: (Desug f g, Functor f)=> Fix f — > Fix g 
desugar — fold desugAlg 



Figure 2. Desugaring using type classes. 

5.2.1 Example: Desugaring 

To illustrate the utility of the isomorphism constraint and in partic- 
ular the split combinator, consider the following signature functor 

data Dbl a = Double a 

with the intended semantics that Double doubles its argument. 
This Double operator can be considered syntactic sugar for the 
arithmetic expression language Fix Arith, since we can translate 
Double x into Add x x. So we should be able to implement a 
desugaring function of type Fix f — > Fix g such that g is "/ 
without Dbl" and g contains at least Add. Using the power of data 
types a la carte we can implement such a desugaring function. 

To do so, however, we have to follow the pattern described in 
section |2~Tj i.e. define a suitable type class and provide the neces- 
sary instance declarations. Figure [2] gives the detailed implementa- 
tion. Moreover, the resulting type of the desugaring function will 
not immediately describe the relationship between the two signa- 
tures / and g. With the new isomorphism constraint :~: we can 
do better and give a function with the following type, without any 
additional type class infrastructure: 

desugar :: (/ :~: g :+: Dbl, Add H: g, Functor f) 
=>■ Fix f — > Fix g 

The type signature explains the relationship between / and g in a 
direct and succinct way. The implementation itself is straightfor- 
ward. However, we have to give type annotations in order to make 
explicit how / should we decomposed: 

desugar — fold desugAlg 

desugAlg :: (/ :~: g :+: Dbl, Add :-<: g) 

=> / (Fix g) — > Fix g 
desugAlg = split (Ax — > In x) 

(X(Double x) —¥ inject (Add x x)) 

The algebra that is used to implement the desugaring uses split to 
pattern match according to the isomorphism f:~:g:+:Dbl. The first 
case of this pattern matching performs the trivial transformation 
via In whereas the second case performs the desired desugaring of 
Double. 

5.2.2 Example: Overriding Default Implementation 

Consider the implementation of a modular evaluation function eval 
shown in Figure [3] The implementation follows the typical pattern 
for defining a function on data types a la carte: a type class that 
provides the underlying algebra is declared, instances are declared 
for the sum construction and each atomic signature, and finally the 
function is defined as a fold over the thus defined modular algebra. 



class Eval f where 
evalAlg :: / Int — > Int 

instance (Eval f, Eval g) =>■ Eval (f :+: g) where 
evalAlg (Inl x) = evalAlg x 
evalAlg (Inr x) — evalAlg x 

instance Eval Add where 
evalAlg (Add x y) = x + y 

instance Eval Dbl where 
evalAlg (Double x) = x + x 

instance Eval Val where 
evalAlg ( Val n) = n 

eval :: (Eval f, Functor f) =>■ Fix f — > Int 
eval = fold evalAlg 

Figure 3. Modular evaluation function. 

This approach yields a modular and extensible function defini- 
tion. However, the modularity is restricted as this setup does not 
allow us to replace one of the instance declarations. For example, 
if we wish to have an alternative evaluation function that evaluates 
Double x to 2 * x instead of x + x, we have to define a separate 
type class, duplicate all instance declarations (except the one for 
Dbl) and provide a new instance declaration for Dbl that imple- 
ments the alternative evaluation. 

Using the split combinator, we can override the evaluation 
implementation for Dbl without writing a new evaluation function 
from scratch. To achieve this, we split the signature / into the form 
g :+: Dbl, use the default implementation for g, and provide a new 
implementation for Dbl: 

eval' :: V / g . (f :~: g :+: Dbl, Eval g, Functor f) 

=>■ Proxy g — s> Fix f — > Int 

eval' _ = fold evalAlg' 
where evalAlg' = split (X(x :: g Int) — > evalAlg x) 
(X(Double x) — > 2 * x) 

Note that we have to provide the type g that is used in the split as an 
explicit type argument via a proxy. For example we can instantiate 
the above evaluation function to a concrete signature as follows: 

evaluate :: Fix (Arith :+: Dbl) — > Int 
evaluate — eval' (P :: Proxy Arith) 

While use of split in the above two examples produces more 
succinct code and avoids code duplication, one might expect that 
it incurs a runtime performance penalty since the pattern matching 
according to the isomorphism f :~: g :+: Dbl means that values over 
/ have to be first decomposed and then composed again to obtain 
values over g :+: Dbl. To test this hypothesis, we have performed 
a number of benchmarks using the Criterion Haskell library. We 
tested extended implementations of the desugaring as well the 
evaluation example presented above. We were not able to see any 
difference in the runtime between the implementations using split 
and the implementations using type classes. Surprisingly, this still 
holds as we increase the number of summands in the signatures. We 
measured the runtime for examples using signatures with up to 25 
summands and did not see any difference in runtime performance. 

6. Improving Performance and Error Messages 

In this section we shall refine the implementation we presented in 
section|5]in order to produce more efficient injection and projection 
functions as well as more helpful error messages. 



data Pos = Here | Left Pos \ Right Pos | Sum Pos Pos 
data Res = Found Pos \ NotFound \ Ambiguous 

type family Elem (/::*—>*)(<;::*—>*):: Res where 
Elem f f — Found Here 

Elem f (gi :+: g 2 ) = Choose f (g± :+: g 2 ) 

(Elemf #i) {Elem I g 2 ) 
Elem f g — NotFound 

type family Choose f g (I :: Res) (r :: Res) :: Res where 
Choose f g {Found x) {Found y) = Ambiguous 
Choose f g Ambiguous y = Ambiguous 

Choose f g x Ambiguous — Ambiguous 

Choose f g {Found x) y — Found {Left x) 

Choose f g x {Found y) — Found {Right y) 

Choose (/1 :+:/2) g x y = Sum' {Elem fi g) {Elem f 2 g) 
Choose f g x y = NotFound 

type family Sum' {I :: Res) (r :: Res) :: Res where 
Sum' {Found x) {Found y) — Found {Sum x y) 
Sum' Ambiguous y — Ambiguous 

Sum' x Ambiguous — Ambiguous 

Sum' x y — NotFound 

Figure 4. Implementation of Elem. 



6.1 A More Efficient Implementation 

The implementation of :-<: from section [5] is a straightforward ex- 
tension of the simple implementation given in section]?] it decom- 
poses the left-hand side of a subsumption constraint into its atomic 
components and then uses the simple implementation on each of 
these atomic components. In some circumstances this approach 
causes the derived implementations of inj and prj to perform un- 
necessary decomposition and recomposition of its arguments. 

For example, consider the seemingly innocuous subsumption 
Arith :-<: Arith :+: Mult. Since Arith is defined as the sum 
Val:+:Add, the function inj is effectively implemented as follows: 

inj :: Arith a — >■ {Arith :+: Mult) a 
inj {Inl x) = Inl {Inl x) 
inj {Inr x) = Inl {Inr x) 

It pattern matches on its argument only to reconstruct the original 
argument again. Instead, inj could be implemented simply as Inl. 

In order to achieve this behaviour, we shall refine the imple- 
mentation of the type family Subsume such that it interleaves the 
deconstruction of the left-hand side signature with the search for an 
embedding into the right-hand side. The resulting implementation 
of the Elem type family is shown in Figure[4j 

The kind Res is defined as previously, but we have changed the 
kind Pos to include a type constructor Sum. This additional type 
constructor corresponds to the type constructor of the same name 
for the kind Struc (cf. section \5A\ . It indicates that the left-hand 
side signature is a sum, and that we need to decompose it into its 
two summands in order to find the desired embedding. 

The definition of the type family Elem is similar to the original 
definition of Elem in section [JJ The only difference is that it 
passes the two original signatures to the Choose type family. These 
two additional arguments to Choose are needed for the additional 
equation that was added compared to the original definition from 
section]?] namely the equation 

Choose (/1 :+: f 2 ) g x y — Sum' {Elem /1 g) {Elem f 2 g) 



class Subsume (e :: Emb) (/::*—> *) {g :: * — > *) where 
mj' :: Proxy e —¥ f a —¥ g a 
prj' :: Proxy e — > g a — > Maybe (/ a) 

instance Subsume {Found Here) f f where 
inj' _ = id 

prj' _ = Just 
instance Subsume {Found p) f g 

=> Subsume {Found {Left p)) f {g :+: g') where 
inj' _ = Inl o inj' {P :: Proxy {Found p)) 

prj' _ {Inl x) — prj' {P :: Proxy {Found p)) x 
prj' _ _ = Nothing 

instance Subsume {Found p) f g 

Subsume {Found {Right p)) f (g' :+: g) where 
inj' _ = Inr o inj' {P :: Proxy {Found p)) 

prj' _ {Inr x) = prj' {P :: Proxy {Found p)) x 
prj' _ _ = Nothing 

instance {Subsume {Found pi) fx g, 
Subsume {Found p 2 ) f 2 g) 
=► Subsume {Found {Sum pi p 2 )) (/1 f 2 ) g where 
inj' _ {Inl x) — inj' {P :: Proxy {Found pi)) x 
inj' _ {Inr x) = inj' {P :: Proxy {Found p 2 )) x 

prj' _ x — case prj' {P :: Proxy {Found pi)) x of 
Just y — > Just {Inl y) 

_ — > case prj' {P :: Proxy {Found p 2 )) x of 
Just y — > Just {Inr y) 
„ — > Nothing 

Figure 5. Implementation of Subsume. 



Here we try to decompose the left-hand signature in case we were 
not able to find an embedding for the whole signature. Elem is 
used recursively on the two summands. If both yield a position, 
these positions are combined by Sum, otherwise Ambiguous and 
NotFound are propagated. 

For instance we have the following type equalities 

Elem Arith {Arith :+: Mult) ~ Found {Left Here) 
Elem { Val :+: Mult) {Arith :+: Mult) 

~ Found {Sum {Left {Left Here)) {Right Here)) 

Finally, we need to adjust the type class Subsume to this reim- 
plementation of Elem. The implementation of Subsume is shown 
in Figure [5] The instance declarations follow the structure of Pos: 
Here produces a reflexive subsumption; Left and Right expect a 
sum on the right-hand side and recurse on the left resp. the right 
summand; and Sum expects a sum on the left-hand side of the sub- 
sumption and recurses on both summands. 

The definition of the constraint itself remains the same. In 
particular, we can reuse the type family Dupl for checking for 
duplicates on the left-hand side: 

type / H: g = {Subsume {Elem f g) f g, 
Duplf'l] ~ False) 

One can check that the derived implementations for inj and prj 
indeed satisfy the |lNVERSE] law. 

6.2 Error Messages 

Our implementation of :-<;: already produces quite helpful error 
messages. For instance, consider the following function definition: 



injVal :: Val a — > (Arith :+: Val) a 
injVal — inj 

The use of inj requires the subsumption Val K: Arith :+: Val, 
which should be rejected since Val occurs twice in the right-hand 
side. GHC produces the following error message, which informs 
the programmer that Val is not subsumed by Arith :+: Val and 
that ambiguity is the culprit: 

No instance for 

(Subsume Ambiguous Val (Arith :+: Val)) 
arising from a use of 'inj' 

In the following example we try to use an injection that requires 

Dbl K: Mult :+: Arith: 

injDbl :: Dbl a -> (Mult :+: Arith) a 
injDbl — inj 

As this is not the case, GHC produces the following error message, 
informing the programmer that Dbl cannot be found in Mult :+: 
Arith, and thus there is no such subsumption: 

No instance for 

(Subsume 'NotFound Dbl (Mult :+: Arith)) 
arising from a use of 'inj' 

Compare this to Swierstra s original type class-based imple- 
mentation, which would produce the following error message: 

No instance for (Dbl :<: Add) 
arising from a use of 'inj' 

This error message is not quite as helpful, since it does not indicate 
the original subsumption relation that should be satisfied, namely 
Dbl :-<: Mult :+: Arith. Giving this information can be quite 
valuable. For example, maybe the error ways caused by accidently 
using Mult instead of Dbl in the sum on the right-hand side. 

While the Subsume type class produces reasonably helpful 
error messages, the second part of the :-<: constraint, namely 
Dupl f '[] ~ False, does certainly not. If we try to derive a 
subsumption relation with duplicates on the left-hand side, e.g. 
Val :+: Arith :-<!: Arith, then GHC provides the error message: 

Couldn't match type 'True' with 'False' 
In the expression: inj 

To circumvent this problem, we replace the equality check by a 
type class that has only one instance, namely for False. In addition, 
we also give it the signature that is checked for duplicates as an 
argument, so it will show up in error messages: 

type f X:g = [Subsume (Elem f g) f g, 

NoDuplf {Dupl /'[])) 
class NoDupl f s 
instance NoDupl f False 

With this definition we get the following more helpful error 
message: 

No instance for (NoDupl (Val :+: Arith) True) 
In the expression: inj 

Finally, we should note that the refined subsumption constraint 
K: defined in this section is more liberal with ambiguous embed- 
dings compared its previous version presented in section|5] We re- 
defined Elem such that it tries to find embeddings as early as pos- 
sible in order to avoid unnecessary decomposition of signatures. As 
a consequence, we can derive the following subsumption: 

Add :+: Val K: (Add :+: Val) :+: Val 



Elem immediately returns Found (Left Here) without further 
decomposing the left-hand side signature. However, there are obvi- 
ously two ways of embedding Val from the left-hand side into the 
right-hand side signature. 

This issue can be avoided by also requiring that right-hand sides 
do not contain duplicates. Thus we redefine :-<: one last time: 

type / :-<■■ g = (Subsume (Elem f g) f g, 
NoDuplf (Duplf'[]), 
NoDupl g (Dupl g '[])) 

This definition is more restrictive than before as it also disallows 
duplication on the right-hand side even though it is not in the image 
of the embedding. For instance, we can no longer derive 

Val H: Add :+: Add :+: Val 

which was possible with the definition of subsumption from sec- 
tion|5] As duplication of signatures on either sides of the subsump- 
tion relation is almost certainly unintentional, this more restrictive 
behaviour is to be preferred. 

7. Discussion 
7.1 Limitations 

The new implementation of the signature subsumption constraint 
:-<: improves the original implementation in many respects as we 
have shown throughout the paper. But, unfortunately, replacing 
type classes by type families has some drawbacks. 

Ground Signatures The most important limitation is that :-< : only 
works for ground types, i.e. neither side may contain variables. 
This is to be excepted since we cannot rule out both ambiguity 
and duplication if the signatures on either side of H : are not fully 
instantiated. For example, we may not derive that Val :-<;:/:+: Val, 
since if / were instantiated by Val, then the subsumption would be 
ambiguous. 

Concretely, this restriction manifests itself in the implicit re- 
quirement for apartness in the semantics of closed type fami- 
lies {Eisenberg et al. 2014). Specifically, an equation of a closed 
type family is applied only if it matches and is apart from any other 
equation occurring above it (unless it would yield the same result). 
Intuitively, the apartness requirement means that there is no possi- 
ble instantiation of type variables that would make a previous equa- 
tion applicable. (More correctly, it is a conservative approximation 
of this intuition.) 

For example, if we were to write the function 

vallnj :: Val a —¥ (f :+: Val) a 
vallnj = inj 

which requires the constraint Val H: / :+: Val to be derivable, the 
simplification of the type Elem Val (f :+: Val) gets stuck at 

Choose Val (f :+: Val) (Elem Val f) (Found Here) 

The fifth equation for the type family Choose (cf. Figure |4]l 
matches. However, if / was instantiated to Val, then the first equa- 
tion would match; and if / was instantiated to Val :+: Val the 
second equation would match. Therefore, we cannot (and should 
not) apply the fifth equation. 

This restriction to ground signatures becomes even more appar- 
ent for the Dupl type family (cf. Figure[T](. Intuitively, it is clear that 
we cannot rule out that a signature functor contains duplicates if it 
contains a variable summand, as the variable may be instantiated 
by Val :+: Val, say. Concretely, this can be seen in the definition 
of Dupl. The type Dupl f I cannot be simplified if / is a variable: 
the first equation of Dupl does no match, but it may match if / is 
instantiated to a sum. 



Error Messages Due to the apartness restriction of closed type 
families, simplification of types may fail as we have described 
above. This may lead to overly verbose error messages. For exam- 
ple, if we ask GHC to type check the function definition for vallnj 
given above we receive the following error message: 

No instance for 
(Subsume (Choose Val (f :+: Val) 

(Elem Val f) (Found Here)) 
Val (f :+: Val)) 
arising from a use of 'inj' 

Here the error message is polluted with the type that could not 
be simplified further due to lack of apartness as described above. 
Nonetheless, the error message still contains the relevant informa- 
tion: there is no instance for Subsume (...) Val (/ :+: Val), i.e. 
Val is not subsumed by / :+: Val. 

Apart from the unnecessary verbosity, error messages like the 
one above also expose the user of the library to implementation 
details that are not part of the API. In particular, the above error 
mentions the type class Subsume and the type families Choose 
and Elem, with which a user of the library should not be concerned. 

As a result, comprehending the error messages for our library 
requires some practice. Ideally, as library authors we would like to 
adjust the error messages that our library produces such that they 
adhere to the abstractions of the API and explain errors in terms of 
the domain of the library. Alas, GHC does not provide any interface 
that would allow such customisation of error messages. 

Recently, |Christiansen| ( 2014) presented a simple, reflection- 
based mechanism to customise error messages in the dependently 
typed functional programming language Idris (Brady 2013 ). With 
an customisation interface for error messages similar to |Chris-| 
|tiansen| s, we would be able to drastically simplify error messages, 
which would make our library much easier to use. 

Compile Time Performance Using the implementation from sec- 
tion[5]we can easily deal with large signatures comprising 25 sum- 
mands without a noticeable delay in type checking. Unfortunately, 
we did notice a significant impact on type checking performance 
with the implementation from section|6] for a larger program using 
signatures consisting of more than 10 summands, type checking 
becomes impractically slow (in the order of minutes!). 

We found that this performance bottleneck was caused by the 
following equation for the Choose type family (cf. Figure]?}: 

Choose (/i :+: fa) g x y = Sum' (Elem /i g) (Elem $2 g) 

To avoid this problem, we remove this equation and instead add the 
following as the second equation for Elem: 

Elem (fi :+: $2) g = Sum' (Elem f\ g) (Elem g) 

This change also makes it possible to remove the first two argu- 
ments from Choose, since they become unnecessary. 

The resulting implementation would produce the same (subopti- 
mal) injection and projection functions as the implementation from 
section|5] We can, however, restore the semantics of the original im- 
plementation by post-processing the result of Elem appropriately. 
This approach also allows us to remove the explicit check for dupli- 
cates of the right-hand side signatures of subsumption constraints. 
Moreover, checking for duplicates on the left-hand side can be done 
by inspecting the result obtained from Elem, which yields an ad- 
ditional speedup. As a result we get even better compile time per- 
formance than the implementation from section [5] allowing us to 
work with large signatures without problems. 

7.2 Related Work 

The limitation of the original implementation of data types a la 
carte is rooted in the fact that Haskell's search for suitable instances 



does not backtrack. Morris and Jones (2010) proposed an alterna- 
tive to Haskell's overlapping type class instances, called instance 
chains, that does perform backtracking. As demonstrated by Morris 
and Jones (2010), instance chains can be used to give a backtrack- 
ing implementation of :-<:. In particular, they also give an imple- 
mentation that avoids ambiguity, i.e. subsumptions with multiple 
possible injections. We expect that their backtracking implementa- 
tion :-<!: can be extended to also allow compound left-hand sides and 
to express the isomorphism constraint Unfortunately, however, 
instance chains have not been implemented in Haskell. 

The theorem proving assistants Isabelle ( |Nipkow et al.|[2002) 
and Coq l |Bertot and Cas teran 2004]) both implement a type 
class system similar to Haskell's. Both systems, however, resolve 
type class instances by backtracking (Nipkow and Sneltin g|l991| 
|Sozeau and Oury|2008| >. Thus the natural type class-based defini- 
tion of :-<!: can be given directly in these systems. 

7.3 Promoting Functions 

Our implementation uses data type promotion ( Yorge y"et al.|2012| , 
to promote data types such as Pos and Emb to the kind level 
such that we can define closed type families on the resulting kinds. 
Recently, Eisenberg and Stolarek (2014) introduced a library that 
promotes function definitions to closed type family definitions. 
This function promotion mechanism allows the programmer to 
use the familiar syntax of Haskell function definitions to define 
closed type families. In particular, the programmer may then use 
constructs like case and let, which are not supported in closed 
type family definitions. 

For example, we may define the type family Sum' from Fig- 
ure[4]in the following way: 

$ (promote [d \ 
sum' :: Emb — > Emb — > Emb 

sum' (Found x) (Found y) = Found (Sum x y) 

sum' Ambiguous y = Ambiguous 

sum' x Ambiguous = Ambiguous 

sum' x y — NotFound | ] ) 

The above code defines a function sum' with the specified type. 
This definition is then passed to the promote function, which gen- 
erates a corresponding definition of a type family Sum' . The result- 
ing definition of Sum' is equivalent to the one given in Figure|4] 

Since Sum' is quite simple, we do not gain any advantage over 
the original definition. It would be more helpful if we were able to 
write Elem in this style (cf. Figure |4}. A more natural definition 
of Elem would replace the use of the helper type family Choose 
with a case expression. Alas, we cannot use function promotion to 
define Elem, since Elem is defined on kinds containing the kind *, 
which has no counterpart at the type level. Similarly, also the type 
family Dupl in Figure [T] works on kinds containing * and is thus 
out of reach for a definition via promotion. 

7.4 Other Applications 

The implementation presented in this paper can be transferred 
easily to applications of similar structure. For instance, we can 
implement a variant of H : that works on types of kind * instead 

of * — > *. 

Note that while Haskell provides support for kind polymorphism 
( |Yorgey et aTj|2T)T2] >, we do need to re-implement :-<: and the 
underlying machinery essentially for each kind we want to use it 
on. This lack of polymorphism is due to the type constructor :+:. 
According to the definition of :+:, the kind of signatures can be at 
most generalised to the polymorphic kind k — > *. 

More interestingly, we can also transfer :-<;: from binary sums to 
binary products, with the intended semantics that e :-<: p indicates 
that every component of e is also a component of p. For instance, 



(Int, Bool):~i.:(Bool, [Char, Int)). Using the technique described 
in this paper, we can implement put and get functions as follows: 

put :: (e K: p) => p — > e — > p 
gef :: (e K: p) => p — > e 

These functions satisfy the expected equations: 

put p (get p) = p 
get (put p e) = e 
put (put p e) e' = p«f p e 

This setup is especially useful for implementing automata in a 
modular fashion (Bahr 20121 as it allows us to easily combine state 
spaces of different automata using binary products. 

More generally, binary products with automatically derived put 
and get functions as described above can be used as a lightweight 
alternative to the implementation of extensible records of Kiselyov 
et al. (2004). It is lightweight, as it does not require to give type- 
level identifiers to the components of the extensible record/product 
type. Instead, our implementation uses the type information in 
order to select the right component. 

Implementing extensible product types by dispatching on the 
type information alone is typically not a good choice as it is error- 
prone. For example, consider the following selector function: 

getlnt :: (Age, Int) — > Int 
getlnt = get 

It may seem obvious what the semantics of getlnt is. But what 
happens if Age happens to be defined by 

type Age = Int 

There is no obvious choice whether getlnt should return the first 
or the second component. Luckily, with our implementation this 
situation cannot occur. The detection of ambiguities that we imple- 
mented for the subsumption constraint on signatures carries over to 
this implementation as well. In the above situation, the programmer 
would receive an error message. She would then have to resolve the 
problem by denning Age as a newtype instead. 

|Kiselyov et al.| ^2004) implement a similar idea in the form 
type-indexed products. They use type classes to implement a con- 
straint that checks for duplication. However, their products are al- 
ways list-like and have no additional structure. Our implementa- 
tion retains the nested structure of the binary products. As men- 
tioned above, we are able to derive the subtyping (Int, Bool) :-<: 
(Bool, (Char , Int)), which thus yields a get function of type 
(Bool, (Char , Int)) — > (Int, Bool). Using the subtyping con- 
straint we can also implement an isomorphism constraint :~: such 
that we have for example 

(Int, (Char, Bool)) :~: (Bool, (Char, Int)) 

together with automatically derived functions that witness the iso- 
morphism. 

We have used an implementation of extensible product types 
as described above in an embedding of attribute grammars in 
Haskell (Bahr and Axelsson 2014). The fact that components are 
selected according to the type information makes it easier to com- 
bine attribute grammar fragments in a modular fashion compared 
to an implementation that uses extensible records a la | Kiselyov] 
|et al.[ (2004) such as the embedding by |Viera etaL]j2009) . 
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